This is an abridged version of the results. For more details of the LASSO models and interactive maps showing how coefficients for different variables vary spatially, please download Full_Results.html here and open it with your browser.
Are people more likely to connect online with others who share their political opinions? Do people unfriend those with an opposing political orientation? The relationship between political attitudes and social networks has been extensively examined in the literature of political science, sociology, and communication, mostly at the individual level. Scholars tend to approach political homophily as an outcome of individual psychological/behavioral inclinations. However, social networks are reflections of our lifeworlds, and politics are always grounded in local contexts. Local socio-political environment can cast an influence people's online connections, but such ecosystem variables are less explored by existing research.
Focusing on county-level social media connection allows us to bring in contextual variables to test factors that are positively/negatively associated with cross-cutting social connections. Geographically weighted regression further allows us to explore the variation of these relationships across space, achieving a nuanced depiction of factors influencing cross-cutting connections in counties all over the U.S.
for index, row in format_df.iterrows(): p = make_plot(format_df, row['field'], "Physical", 'dv') show(p)Cross-cutting connection rate on Facebook: a county’s ratio of cross-cutting to likeminded Facebook friends with other counties.
Cross-cutting connection rate based on human mobility data: a county’s ratio of cross-cutting to likeminded physical travel to other counties.
$$CrossCuttingConnectivityRate_{i} = \sum \limits _{j=0} ^{n-1} \frac{1 - (DemRatio_{i} DemRatio_{j} + GOPRatio_{i} GOPRatio_{j})}{DemRatio_{i} DemRatio_{j} + GOPRatio_{i} GOPRatio_{j}} * Connectivity_{ij}$$Based on data provided by the Census Bureau
Total population: Total population according to the U.S. census 2021 population estimate.
Birth rate: Birth rate in period 7/1/2020 to 6/30/2021.
Death rate: Death rate in period 7/1/2020 to 6/30/2021.
International migration rate: Net international migration rate (ratio of net international migration to population number) in period 7/1/2020 to 6/30/2021.
Domestic migration rate: Net domestic migration rate (ratio of net domestic migration to population number) in period 7/1/2020 to 6/30/2021.
Male proportion: The proportion of male within the general population according to the U.S. census 2021 population estimate.
Based on data provided by the Census Bureau
Mixed proportion: Proportion of two or more races population according to the U.S. census 2021 population estimate.
White and combined proportion: Proportion of white alone or in combination population according to the U.S. census 2021 population estimate.
Black and combined proportion: Proportion of black alone or in combination population according to the U.S. census 2021 population estimate.
Indigenous and combined proportion: Proportion of American Indian and Alaska Native alone or in combination population according to the U.S. census 2021 population estimate.
AANHPI proportion: Proportion of Asian American, Native Hawaiian, and Pacific Islander alone or in combination population according to the U.S. census 2021 population estimate.
Hispanic proportion: Proportion of Hispanic population according to the U.S. census 2021 population estimate.
Based on data provided by the Bureau of Economic Analysis
Education: Calculated based on the percent of adults with different education levels. Pre_highschool_rate 1 + Highschool_rate 2 + Some_college_rate 3 + Bachelor_plus_rate 4
Median household income: Estimate of median household income, 2021
Unemployment rate 2021: Number of 2021 annual unemployment divided by the U.S. census 2021 population estimate.
Unemployment rate 3 years: Mean of 2019-2021 annual unemployment divided by the U.S. census 2021 population estimate.
Unemployment rate 5 years: Mean of 2017-2021 annual unemployment divided by the U.S. census 2021 population estimate.
Unemployment rate 10 years: Mean of 2012-2021 annual unemployment divided by the U.S. census 2021 population estimate.
Per capita GDP 2021: Real 2021 per capita gross domestic product in thousands of chained (2012) dollars.
GDP change 2021: Real gross domestic product change between 2020 and 2021.
Per capita GDP 3 years: Average of real per capita 2019-2021 gross domestic product in thousands of chained (2012) dollars.
Poverty rate: Estimate of people of all ages in poverty 2021 divided by the U.S. census 2021 population estimate.
Based on 2020 election results data provided by the MIT Election Data and Science Lab
Distance-based local partisan difference: A county’s difference with nearby counties in 2020 Presidential Election results weighted based on the county’s distance to nearby counties. The weights are gaussian kernel weights with adaptive bandwidth (as a function of unit density).
Neighbor-based local partisan difference: A county’s difference with adjacent counties (sharing either border or vertex) in 2020 Presidential Election results.
Democrat-Republican ratio: A county’s Democratic-to-Republican vote share in the 2020 Presidential Election.
Based on the UNC News Desert project
Newspaper publication days: Number of days in a week that any local newspaper publishes.
Newspaper count: Number of local newspapers in a county.
Number of newspapers not owned by state conglomerates: Number of newspapers not owned by conglomerates defined as organizations that own newspapers in 2 or more states.
Number of newspapers not owned by county conglomerates: Number of newspapers not owned by conglomerates defined as organizations that own newspapers in 3 or more counties.
TV count: Number of public TV stations in a county.
Radio count: Number of public radio stations in a county.
Original content radio count: Number of public radio stations in a county that produce original content.
Non-original content radio count: Number of public radio stations in a county that do NOT produce original content.
*Based on the Social Capital Atlas dataset.
Cross-class connectedness: Calculated based on Facebook friend network – two times the share of high-SES friends among low-SES individuals, averaged over all low-SES individuals in the county.
Cross-class exposure: Mean exposure to high-SES individuals by county for low-SES individuals – two times the average share of high-SES individuals in individuals’ groups, averaged over low-SES users.
Clustering: The average fraction of an individual’s friend pairs who are also friends with each other.
Support ratio: The proportion of within-county friendships where the pair of friends share a third mutual friend within the same county.
Volunteering rate: The percentage of Facebook users who are members of a group which is predicted to be about ‘volunteering’ or ‘activism’ based on group title and other group characteristics.
Civic organization density: The number of Facebook Pages predicted to be “Public Good” pages based on page title, category, and other page characteristics, per 1,000 users in the county.
Given the large number of variables we have and geographically weighted regression's tendency to overfit, we first built LASSO models to filter out insignificant variables. We obtained a list of variables whose phi value is larger than .01 in the first model and .005 in the second and third models. After multicolinearity tests, here are the variables we selected for the GWR model:
International migration rate
AANHPI proportion
Hispanic proportion
Per capita GDP 3 years
Poverty rate
Democrat-Republican ratio
Distance-based local partisan difference
Newspaper publication days
Original content radio count
Cross-class friendship
=========================================================================== Model type Gaussian Number of observations: 3112 Number of covariates: 15 Global Regression Results --------------------------------------------------------------------------- Residual sum of squares: 2356.620 Log-likelihood: -3983.111 AIC: 7996.222 AICc: 7998.398 BIC: -22552.615 R2: 0.243 Adj. R2: 0.239 Variable Est. SE t(Est/SE) p-value ------------------------------- ---------- ---------- ---------- ---------- X0 -0.000 0.016 -0.000 1.000 X1 -0.059 0.025 -2.331 0.020 X2 0.106 0.017 6.266 0.000 X3 -0.025 0.019 -1.356 0.175 X4 -0.024 0.016 -1.481 0.139 X5 -0.205 0.023 -8.922 0.000 X6 0.006 0.025 0.235 0.814 X7 -0.008 0.021 -0.361 0.718 X8 -0.026 0.017 -1.478 0.139 X9 -0.091 0.023 -4.017 0.000 X10 0.156 0.020 7.703 0.000 X11 0.359 0.019 18.832 0.000 X12 -0.076 0.022 -3.505 0.000 X13 0.156 0.024 6.510 0.000 X14 0.034 0.018 1.834 0.067 Geographically Weighted Regression (GWR) Results --------------------------------------------------------------------------- Spatial kernel: Adaptive bisquare Bandwidth used: 200.000 Diagnostic information --------------------------------------------------------------------------- Residual sum of squares: 572.013 Effective number of parameters (trace(S)): 526.031 Degree of freedom (n - trace(S)): 2585.969 Sigma estimate: 0.470 Log-likelihood: -1780.093 AIC: 4614.249 AICc: 4829.645 BIC: 7799.110 R2: 0.816 Adjusted R2: 0.779 Adj. alpha (95%): 0.001 Adj. critical t value (95%): 3.192 Summary Statistics For GWR Parameter Estimates --------------------------------------------------------------------------- Variable Mean STD Min Median Max -------------------- ---------- ---------- ---------- ---------- ---------- X0 0.126 0.527 -1.463 0.160 1.425 X1 -0.166 0.428 -2.500 -0.110 1.882 X2 0.047 0.144 -0.316 0.030 0.639 X3 -0.003 0.128 -0.752 0.000 0.437 X4 -6.830 12.320 -54.989 -5.017 24.648 X5 -0.193 0.250 -1.028 -0.205 0.566 X6 0.042 0.534 -1.064 -0.066 2.979 X7 0.148 0.328 -0.967 0.170 1.045 X8 0.153 0.327 -0.791 0.142 1.466 X9 0.073 0.177 -0.640 0.075 0.645 X10 0.776 1.037 -0.475 0.439 5.555 X11 0.139 0.143 -0.197 0.128 0.633 X12 -0.038 0.111 -0.379 -0.025 0.392 X13 0.196 0.222 -0.140 0.131 1.192 X14 -0.039 0.115 -0.918 -0.029 0.685 ===========================================================================
=========================================================================== Model type Gaussian Number of observations: 3112 Number of covariates: 15 Global Regression Results --------------------------------------------------------------------------- Residual sum of squares: 1589.540 Log-likelihood: -3370.383 AIC: 6770.766 AICc: 6772.942 BIC: -23319.696 R2: 0.489 Adj. R2: 0.487 Variable Est. SE t(Est/SE) p-value ------------------------------- ---------- ---------- ---------- ---------- X0 0.000 0.013 0.000 1.000 X1 -0.014 0.021 -0.672 0.501 X2 0.042 0.014 3.052 0.002 X3 -0.006 0.015 -0.427 0.670 X4 -0.022 0.013 -1.712 0.087 X5 -0.149 0.019 -7.862 0.000 X6 0.111 0.020 5.523 0.000 X7 -0.027 0.017 -1.595 0.111 X8 -0.037 0.014 -2.634 0.008 X9 -0.128 0.019 -6.907 0.000 X10 0.410 0.017 24.709 0.000 X11 0.282 0.016 18.042 0.000 X12 -0.068 0.018 -3.823 0.000 X13 0.124 0.020 6.326 0.000 X14 0.091 0.015 5.985 0.000 Geographically Weighted Regression (GWR) Results --------------------------------------------------------------------------- Spatial kernel: Adaptive bisquare Bandwidth used: 200.000 Diagnostic information --------------------------------------------------------------------------- Residual sum of squares: 253.669 Effective number of parameters (trace(S)): 526.031 Degree of freedom (n - trace(S)): 2585.969 Sigma estimate: 0.313 Log-likelihood: -514.862 AIC: 2083.787 AICc: 2299.183 BIC: 5268.648 R2: 0.918 Adjusted R2: 0.902 Adj. alpha (95%): 0.001 Adj. critical t value (95%): 3.192 Summary Statistics For GWR Parameter Estimates --------------------------------------------------------------------------- Variable Mean STD Min Median Max -------------------- ---------- ---------- ---------- ---------- ---------- X0 0.116 0.415 -1.221 0.141 1.049 X1 -0.059 0.315 -1.219 -0.059 2.418 X2 0.011 0.099 -0.331 0.009 0.324 X3 0.002 0.094 -0.718 0.004 0.407 X4 -2.483 7.869 -31.598 -2.461 27.299 X5 -0.081 0.144 -0.430 -0.099 0.393 X6 0.171 0.451 -0.567 0.051 2.961 X7 0.126 0.282 -0.847 0.151 0.647 X8 0.048 0.241 -0.749 0.047 1.158 X9 -0.022 0.145 -0.723 -0.018 0.463 X10 1.008 0.893 -0.200 0.773 4.112 X11 0.093 0.109 -0.128 0.086 0.410 X12 -0.018 0.077 -0.300 -0.012 0.273 X13 0.083 0.127 -0.216 0.055 0.881 X14 0.004 0.073 -1.048 0.010 0.336 ===========================================================================
=========================================================================== Model type Gaussian Number of observations: 3112 Number of covariates: 12 Global Regression Results --------------------------------------------------------------------------- Residual sum of squares: 419.628 Log-likelihood: -1298.055 AIC: 2620.110 AICc: 2622.228 BIC: -24513.736 R2: 0.865 Adj. R2: 0.865 Variable Est. SE t(Est/SE) p-value ------------------------------- ---------- ---------- ---------- ---------- X0 0.000 0.007 0.000 1.000 X1 0.704 0.008 93.082 0.000 X2 -0.034 0.007 -4.808 0.000 X3 -0.008 0.010 -0.835 0.404 X4 0.099 0.008 13.160 0.000 X5 -0.014 0.007 -2.031 0.042 X6 -0.066 0.010 -6.942 0.000 X7 0.306 0.008 36.120 0.000 X8 0.030 0.008 3.609 0.000 X9 -0.006 0.009 -0.738 0.460 X10 0.028 0.009 3.070 0.002 X11 0.068 0.008 8.776 0.000 Geographically Weighted Regression (GWR) Results --------------------------------------------------------------------------- Spatial kernel: Adaptive bisquare Bandwidth used: 200.000 Diagnostic information --------------------------------------------------------------------------- Residual sum of squares: 121.951 Effective number of parameters (trace(S)): 432.233 Degree of freedom (n - trace(S)): 2679.767 Sigma estimate: 0.213 Log-likelihood: 624.776 AIC: -383.087 AICc: -242.579 BIC: 2234.946 R2: 0.961 Adjusted R2: 0.954 Adj. alpha (95%): 0.001 Adj. critical t value (95%): 3.200 Summary Statistics For GWR Parameter Estimates --------------------------------------------------------------------------- Variable Mean STD Min Median Max -------------------- ---------- ---------- ---------- ---------- ---------- X0 0.063 0.214 -0.476 0.042 0.933 X1 0.508 0.152 0.034 0.523 0.895 X2 -0.014 0.047 -0.187 -0.013 0.114 X3 0.010 0.074 -0.221 0.012 0.199 X4 0.140 0.155 -0.155 0.101 1.185 X5 -0.001 0.115 -0.374 0.005 0.351 X6 -0.053 0.071 -0.357 -0.045 0.127 X7 0.673 0.558 -0.255 0.593 2.620 X8 0.020 0.052 -0.161 0.016 0.214 X9 -0.000 0.038 -0.111 -0.001 0.138 X10 -0.014 0.064 -0.280 -0.008 0.237 X11 0.021 0.057 -0.320 0.019 0.246 ===========================================================================